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GENERATING AND UTILIZING 
ORGANIZED PROFILE INFORMATION 

FIELD OF THE INVENTION 

The present invention relates to the optimization of com- 
puter program instructions. More particularly, the present 
invention relates to a compiler program that utilizes a 
profiling optimization system. 

BACKGROUND OF THE INVENTION 

The development of the ED VAC computer system of 
\9AS is often cited as the beginning of the computer era. 
Since that time, dramatic advances in both hardware (i.e., 
the computer's electronic components) and software (i.e., 
computer programs) have drastically improved the perfor- 
mance of computer systems. However, modem software 
programs, often containing millions of instructions, have 
become very complex when compared with early computer 
programs. Because the execution time (and hence, 
performance) of a computer program is very closely related 
to the number of instructions contained in the program, 
developers must continue to find new ways of improving the 
efficiency of computer software. 

Most modem computer programs are typically written in 
a high-level language that is easy to understand by a human 
programmer. Special software tools, known as compilers, 
take the human-readable form of a computer program, 
known as "source code," and convert it into machine - 
readable instructions, known as "object code /'Because a 
compiler generates the stream of instructions that are even- 
tually executed on a computer system, the manner in which 
the compiler converts the source code into object code 
affects the execution time of the computer program. 

As noted, the continual desire to use larger, faster and 
more complex software programs has forced system devel- 
opers to find new methods of improving the rate at which 
programs run. Software developers have focused a great deal 
of effort on developing methods of generating efficient 
computer instructions that can take full advantage of the 
hardware systems on which they are to be executed. Sucb 
methods of improving the sequencing or placement of 
computer instructions within a computer program are 
referred to as optimizations. Numerous optimization tech- 
niques to improve the performance of software are known in 
the art today. 

Profiling is one technique that can be used to improve 
software optimization. Profiling uses predicted information 
on how a program will run to further optimize the computer 
program. For example, if it is known that certain blocks of 
code (i.e., distinct portions of a program) will be executed 
more often than other code blocks, performance may be 
enhanced by handling those blocks of code in a particular 
manner. (E.g., it might be desirable to position the code 
blocks in memory in a manner that improves the utilization 
of cache memory.) Thus, profiling seeks to improve optimi- 
zations and therefore system performance by using infor- 
mation regarding the expected behavior of blocks of code 
within a computer program. Specifically, by identifying 
frequently used code blocks and execution paths, software 
programs can be created to maximize the performance of the 
hardware on which they will run. 

In order to implement any profiling system, accurate 
profile or behavior information must be collected by first 
running the program on a set of inputs believed to represent 
typical operating conditions. Collecting profile information 
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is referred to as "benchmarking/'Once the profile informa- 
tion is collected, it can then be used for optimization 
purposes during a subsequent compilation of the source code 
used to build the program. Various known methods of 

5 optimizing program code with profile data exist. 

While most profiling mechanisms are fairly automated 
(e.g., compilers often include automated mechanisms for 
facilitating profiling), the actual process of profiling a soft- 
ware program can become fairly time consuming and costly 

io as the size and complexity of the program grows. One 
recognized limitation with profiling is that as software errors 
(i.e., bugs) are identified and corrected, the entire profiling 
of the program must often be repeated. Under typical 
conditions, it would be inappropriate to use old profiling 

is data with a modified software program because the execu- 
tion paths traversed in the modified code may differ signifi- 
cantly from the execution paths traversed in the original 
code for which the profile data was gathered. Furthermore, 
if new procedures are added, additional instrumentation 

20 code may be required to properly profile the program. Thus, 
with respect to profiling, a potentially significant amount of 
overhead is created each time a software program is modi- 
fied. 

This problem may not be that serious as long as the 

25 software program is relatively small. However, with larger 
programs, such as operating systems that may contain thou- 
sands of source code modules and millions of lines of source 
code, the time and expense involved in re -profiling the 
program each time a minor bug fix occurs may be signifi- 

30 cant. Moreover, delivering a bug fix (i.e., a patch) to the 
customer may then require shipping an entire new product, 
which creates additional overhead and expense to both the 
developer and the customer. At present, no other viable 
option exists (which does not result in serious performance 

35 degradation) except to reprofile the entire program after each 
source code modification. 

Thus, because of the economic drawbacks involved, 
present profiling methodologies cannot be effectively used 
with complex commercial software products, such as oper- 

40 ating systems. Without a profiling system and method that 
can support bug-fixing without significantly sacrificing sys- 
tem performance, the use of profiling in large systems will 
be limited. 

SUMMARY OF THE INVENTION 

45 

The present invention provides a system and method that 
organizes profile information in a hierarchical fashion in 
order to eliminate the need to re-profile a program each time 
a software error is fixed. The apparatus and method dis- 

50 closed herein causes profile information to be stored in 
procedure specific storage areas during the benchmarking 
phase and then,cduj^g"the~optimizMiq^phase, provides a 
system for identifying and utilizing valid profile mformatiorf^ 
^(andignormg invalid profile ~ inform ation),aseach-procedure^ 

55 is processed. 

The invention features a compiler system that includes a 
code generator for converting a first instruction module into 
a second instruction module; an instrumentation mechanism 
for inserting instrumentation code into the second instruc- 

60 tion module and for initializing procedure specific data 
storage areas for each procedure within the first instruction 
module being compiled; and-an-optimization-mechanism^ 
toat optimize^ procedure specific 3 

profil ejata^The invention may further comjmsTFharvest- 

65 ing mechanism that can organize procedure specific profile 
information into files readable by the above -described opti- 
mization mechanism. 
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The invention further features a method of generating and FIG. 10 depicts the control flow graph of FIG. 8 with fully 

utilizing profile data for a computer program that is built annotated arcs. 

from at least one source code module wherein the method FIG. 11 depicts a control flow graph for a revised program 

comprises the steps of: creating an instrumented executable procedure. 

program that includes a process for generating procedure 5 FIG. 12 depicts a flow diagram of an optimization phase 

specific profile data; benchmarking the instrumented execut- within a i^ng mec hanism in accordance with a preferred 

able program and storing profile information in procedure embodiment of the present invention, 
specific data areas; and optimizing the source code module 

such that the procedures that have not been modified since DESCRIPTION OF THE PREFERRED 
the benchmarking step will be processed using said proce- 10 EMBODIMENT 
dure specific profile data while the procedures that have The t iavGaiion relates to the optimization of corn- 
been modified since the benchmarking step will be pro- programs using profile data. For those that are not 
cessed without procedure specific profile data. experls jn tne fiddj tne 0verview below provides 
The invention also features a system and method for general background information that will be helpful in 
reordering procedures within an object module or executable 35 understanding the concepts of the invention, 
module that uses procedure specific profile data. In 

particular, the system and method provide an improved Overview 

system for determining the order of procedures even in the 1- Profiling 

case where significant source code changes, such as the Many modem software development environments 

addition or deletion of entire procedures, took place. 20 include a profiling mechanism that uses information col- 

Therefore, it is an advantage of the present invention to lected about a Program's runtime behavior (known as profile 

provide a profiling system that will permit bug fixes and data) to improve optimization of that program. "Profile data" 

program improvements to occur without serious loss of as used herein means any estimates of execution frequencies 

performance and without rebuilding or rebenchmarking an ' m a computer program, regardless of how the estimates are 

entire software product. It is therefore a further advantage of 25 generated. 

the present invention to provide a profiling system in which n * n are various Periling systems, or mechanisms for 

profile data is stored in a hierarchical and separable fashion. generating profile data. Examples include instrumenting 

It is therefore a further advantage of the present invention to profilers, trace-based profilers, and sampling profilers, 

have a separate unique profile data file for each source code Instrumenting profilers operate by recompiling the program 

module that is used to build a software product. It is 30 ^ S V*™ { instrumentation "hooks" placed at important 

therefore a further advantage of the present invention to branch P oints - M tbe instrumented program executes, these 

have a unique area for holding profile data for each proce- nooks cause data counters to be updated, accumulating the 

dure of a source code module. It is a further advantage of the brancn decisions. Trace-based profilers operate by collecting 

present invention to provide an optimization system in an execution trace of the instructions executed by the 

which module counter areas and procedure counter areas can 35 program. Information is then reduced to a manageable size 

be checked for their existence and validity. 10 determine how often each branch in the program was 

taken and not taken. A sampling profiler operates using a 

BRIEF DESCRIPTION OF THE DRAWINGS hardware timer, periodically waking up a process that 

The preferred embodiments of the present invention will records the address of the currently executing instruction, 

hereinafter be described in conjunction with the appended *o While the present invention is generally concerned with 

drawings, where like designations denote like elements, and: improvements in instrumenting profilers, it is recognized 

FIG. 1 depicts a block diagram of a computer system that thal an Y other tv P e of profiling system could be covered by 

includes a compiler mechanism in accordance with a pre- certain aspects of this invention. 

ferred embodiment of the present invention. As noted above (with regard to instrumenting profilers), 

FIG, 2 depicts an example of a module counter area in 45 the Program must first be retrofitted with instrumentation 

accordance with a preferred embodiment of the present code (i.e., hooks) lhat causes profile information to be saved 

invention when the program is executed on a representative set of 

FIG. 3 depicts an example of a procedure counter area in inputs Instrumentation code typically involves strategically 

accordance with a preferred embodiment of the present inserted .nstructions that count how often a block of code is 

invention 50 execute d or now oticn a certain path is taken (i.e., how often 

HG. 4 depicts a flow diagram of the fundamental phases bl ° ck A transfers control to block B). Once the profile 

f Ail . . °\ ... r j l j- information is collected, it can then be used to optimize the 

of a profiling system in accordance with a preferred embodi- r . . .. ,, . * • , 

f r 4U . • .• very program from which it was collected. Various methods 

ment oi the present mvention. r V- • • i c . j . i 

n „ m , r . „ , . of optimizing program code with profile data are known in 

FIG. 5 depicts a flow -diagram of the instrumentation 55 ^ &n ^ a ical instruraenti fili tem 

phase in accordance with a preferred embodiment of the indudes (1) aQ mstrumentation phase where a program ^ 

present mvention. retrofitted with "information collecting" instructions; (2) a 

HG. 6 depicts a flow diagram of the benchmarking phase benchmarking phase where the program is run and profile 

in accordance with a preferred embodiment of the present information is collected; and (3) an optimization phase 

invention. 60 wnere tne p r0 gram is recompiled and optimized in light of 

FIG. 7 depicts a flow diagram of the optimization phase the profile information, 

within a compiler in accordance with a preferred embodi- 2. Compilers 

ment of the present invention. Executable computer programs are typically constructed 

FIG. 8 depicts a control flow graph for a software pro- by software programs called compilers. Initially, a program- 

cedure. 65 m er first drafts a computer program in human readable form 

FIG. 9 depicts the control flow graph of FIG. 8 with (called source code) prescribed by the programming 

instrumentation blocks added. language, resulting in a source code instruction stream or 
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module. The programmer then uses mechanisms that change 
the human readable form of the computer program into a 
form that can be understood by a computer system (called 
machine-readable form, or object code). Additional 
processing, such as linking, may then occur. Linking 
involves a process where multiple object modules are com- 
bined together to create a single executable computer pro- 
gram. The mechanisms described herein are typically called 
compilers; however, it should be understood that the term 
"compiler," as used within this specification, generically 
refers to any mechanism that transforms one representation 
of a computer program into another representation of that 
program. 

The machine-readable form, within this specification, is a 
stream of binary instructions (i.e., ones and zeros) that are 
meaningful to the computer. Compilers generally translate 
each human readable statement in the source code instruc- 
tion stream into zero or more intermediate language 
instructions, which are then converted into corresponding 
machine-readable instructions. Special compilers, called 
optimizing compilers, typically operate on the intermediate 
language instruction stream to make it perform better (e.g., 
by eliminating unneeded instructions, etc.). Some optimiz- 
ing compilers are wholly separate while others are built into 
a primary compiler (i.e., the compiler that converts the 
human readable statements into machine readable form) to 
form a multi-pass compiler. In other words, multi-pass 
compilers first operate to convert source code into an 
instruction stream in an intermediate language understood 
only by the compiler (i.e., as a first pass or stage) and then 
operate on the intermediate language instruction stream to 
optimize it and convert it into machine-readable form (i.e., 
as a second pass or stage). 

A compiler may reside within the memory of the com- 
puter which will be used to execute the object code, or may 
reside on a separate computer system. Compilers that reside 
on one computer system and are used to generate machine 
code for other computer systems are typically called "cross 
compilers." The methods and apparatus discussed herein 
apply to all types of compilers, including cross compilers 
and assemblers. 

Many of today's compilers include mechanisms for per- 
forming profiling operations. For example, compilers can 
automatically insert instrumentation code into the created 
object modules during the compilation process of an instru- 
mentation phase. Thus, an instrumented computer program 
can be automatically generated. Compilers can also auto- 
matically read in profile information during an optimization 
phase to create an optimized version of the computer pro- 
gram. 

An example of a profiling system and certain limitations 
associated therewith are discussed with reference to FIGS. 
8-11. FIG. 8 depicts a control flow graph (CFG) that 
represents a procedure having code blocks A, B, C, D and E. 
The CFG of FIG. 8 includes control paths or "arcs" repre- 
sented by arrows that depict how control may be transferred 
between blocks. For example, program control may be 
transferred from block A to block B or block C. However, 
control is never directly transferred from block A to block D 
or block E. Representing procedures in this manner is well 
known in the art of optimizations. 

Another data structure commonly used to represent the 
behavior of a program is a call graph. A call graph for a 
module consists of one or more nodes such that there exists 
a node for each procedure in the module. For example, if a 
module has three procedures, "main," "foo" and "bar," then 
the call graph will have three corresponding nodes. In 
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addition to nodes, call graphs contain arcs placed between 
the nodes if control can be transferred between the nodes. 
For example, if procedure "main" can transfer control to 
procedure "foo," an arc is placed from node "main" to node 
5 "foo." 

By determining which paths are most frequently traversed 
or which of the code blocks are most frequently executed, a 
compiler can use known techniques to efficiently optimize 
the procedure. Such information is provided to the compiler 

10 in the form of profiling data. Once the compiler has the 
profile information, the CFG representing the procedure 
being processed can be "annotated" (i.e., each arc in the 
CFG is given a relative weight). Similarly, arcs in call graphs 
can be annotated with weights indicating the estimated 

is relative frequencies of procedure calls. Moreover, advanced 
techniques allow a CFG to be fully annotated by knowing 
the weights of only a subset of the arcs. For example, with 
respect to FIG. 8, the count information for the entire CFG 
can be determined by collecting count data for arcs labeled 

20 1, 2 and 3. 

FIG. 9 depicts a modified CFG in which instrumentation 
code has been inserted along arcs 1, 2 and 3. The instru- 
mentation code will typically include access to control flow 
counters that will be incremented each time one of these arcs 

25 or paths is traversed during program execution. Once the 
procedure as shown in FIG. 9 has been executed on a 
representative set of inputs, profile information from the 
counters can be extracted and later inputted to the optimizer 
during a subsequent compilation. 

30 FIG. 10 depicts a fully annotated CFG used during such 
a profile feedback step. Here, it can be seen that each arc has 
a weight associated therewith. (E.g., the arc from block A to 
block B has a relative weight of 50.) By knowing such 
weights, the compiler can make optimization decisions such 

35 as: how to order these blocks in memory; when to allow 
early speculative execution of instructions; when it may be 
profitable to unroll the body of a loop; where to place 
register spill code to minimize the cost of spill; and many 
others. 

40 FIG. 11 depicts the procedure of FIG. 8 with a slight 
modification, that is, the addition of a new code block C\ 
While this modification appears to be fairly minor, an 
additional arc must be instrumented in order to provide 
accurate profile information for this procedure. In this case, 

45 arcs 1, 2, 3 and 4 must now include instrumentation code in 
order to provide a fully annotated CFG. In other words, an 
additional counter must be instrumented in order to accu- 
rately represent the behavior of this procedure. 
It can therefore be seen that any time a modification is 

50 made to a procedure that causes a change to its control flow 
graph, the profile information previously gathered for the 
procedure will often be at least partially invalid or incom- 
plete. Prior profiling systems are unable to use existing 
profile data in such cases, and must generate new profile data 

55 for the whole program each time a procedure is modified, if 
the benefits of profile-based optimizations are to be main- 
tained. 

The present invention seeks to provide a hierarchical 
management of profile data such that for systems that have 

60 many procedures and source code modules, existing profile 
information can still be used even if source code modifica- 
tions took place. This is achieved by storing the profile 
information in such a way that the feedback optimization 
step can identify invalid profile information and skip only 

65 the profiling data of those procedures. Conversely, all those 
procedures that had no modifications will continue to use the 
existing profile information. The result is a system that can 
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use existing profile information while only experiencing a or in a second manner that includes a novel profile optimi- 

small amount of performance degradation. zation mechanism 19. The decision to utilize either mecha- 

Further, the present invention can sometimes use existing nism may typically be implemented during the compilation 

profile data for a procedure even when that procedure has procedure with the use of a command line switch. The output 

been modified. Referring again to FIG. 11, it can be seen that 5 of compiler 16, under either case will be an object module 

the weights of all the arcs except C-D, OC, and C-D can for each module 22 compiled. In the case where the 

be accurately determined from the original profile data, instrumentation mechanism 17 is used, object modules 26 

provided that the correspondence between the arcs in the ™" be generated In the case where the profiling optimiza- 

original CFG and the arcs in the modified CFG can be tl0n mechanism 19 is used, object modules 27 will be 

determined. In such cases, the compiler can estimate the 10 S ene 'ated. It should be recognized that while this embodi- 

weights of the arcs that cannot be reconstructed. For me nt assumes a one 4o-one correspondent between source 

example, since block C is known to be executing 70 times, modules an ? ob J ect mod " Ies > lhls invention also covers 

thecompilermightassignweightsof35toeachofarcsC-D, thos f compilers that generate multiple object files from a 

C-C and C-D in FIG. 11. For the purposes of discussing the ™SJ e sour <* fi j e or ^ compilers that generate a single 

present invention, profile data will be said to be "valid" is ob ^ { modllle from multl P le source files ' 

either if the corresponding procedure has not changed, or if ^ instrumentation mechanism 17 will typically be uti- 

the data is considered sufficiently adequate (e.g., it is similar hzed when tne developer seeks to generate profile informa- 

enough to the original procedure) and the compiler can still tion lhat can later be fed back t0 tDe compiler 16 to optimize 

use the data in this fashion. tDe program. When the instrumentation mechanism 17 is 

20 implemented, source code modules 22 are compiled to 

Detailed Description create object modules 26 which in turn will be linked 

together by linking mechanism 18 to create an instrumented 

Referring now to FIG. 1, a computer system 10 is shown executable module 28. It should be recognized by those 

that includes a central processing unit (CPU) 12, memory 14 skiUcd in the wl lhat additional procedures and mechanisms 

and a bus 13. Those skilled in the art will appreciate that the may be required to complete compilation (e.g., 

mechanisms and apparatus of the present invention apply preprocessing, the linking of libraries with object modules, 

equally to any computer system, regardless of whether the etc ) , In addi tion, it is recognized that linking mechanism 18 

computer system is a complex multi-user computing may be incorporated directly into compiler 16, and need not 

apparatus, a single user workstation, a personal computer, or be a sepa rate mechanism. Once the instrumented executable 

an apparatus (e.g., a television, an automobile, etc.) having module 2 g is created, it can be run on a representative set of 

a computer device embedded therein. In addition, it should inputs 

be recognized that other computer system components such ^ inslrumentation mec hanism 17 includes a mechanism 

as cache, input/output (10) devices and network interfaces, for inserti instrumentation code int0 the program t0 pr0 . 

whUe not shown may be included in computer system 10. vide fik mformatiorj for procedures contained in each 

Additional^, although computer system 10 is shown to source module. In addition, it includes a mechanism to set up 

contain only a single CPU 12, it should be understood that uni and novel e afCas fof collectin file mfor . 

the present invention applies equally to computer systems matioQ stQrage afeas> referred tQ hefein a& module 

that have multiple CPU s. counter areas (MCA's) and procedure counter areas (PCA's) 

Pursuant to this invention, memory 14 is shown contain- are described in more detail with regard to FIGS. 2 and 3. 

ing a compiler 16 that is capable of receiving source ^ The initialization of these storage areas allows profile infor- 

modules 22 and subsequently outputting at least two types of ma tion to be stored and managed in module-specific and 

object modules 26 or 27 that can be linked by linking procedure-specific areas, which can later be easily examined 

mechanism 18 to create at least two types of executable and retrieved, 

program modules 28 and 32. In addition, harvesting mecha- 0 nce the instrumented executable module 28 is executed 

nism 20 can be used to store profile data 30 in a manner 45 with representative inputs, profile data 30 can be collected 

acceptable to the compiler. Profile data 30, once collected, for later ^ In this pre f er red embodiment, a harvesting 

can be fed back into compiler 16 or linking mechanism 18 mechanism 20 is utilized to convert the collected profile data 

via path 24. It is understood that compiler 16, linker 18, mt0 profile data files 30 Each profi]e data file 30 has a 

harvesting mechanism 20, and any files generated therefrom, one-to-one correspondence with a source module 22. Thus, 

in addition to residing in memory 14, may exist in the form 5Q module T ^ have its own profile data file (MCA ^ 

of a program product that resides on any type of storage and mo dule 2 will have its own profile data file (MCA 

media such as magnetic disc, magnetic tape, CD-ROM and 2 ). Any additional source modules will likewise have their 

other optical media, transmission media, etc. 0WD unique profile data files Uch profi]e data file irjcIudes 

Main memory 14 may also contain an operating system one or more procedure counter areas. Each procedure 

and other application programs (not shown). Moreover, the 55 counter area within a profile data file corresponds to a 

programs depicted in memory 14 need not always be com- procedure from the corresponding source module, 

pletely stored in main memory 14. Rather, slower mass Therefore, if source module I had three procedures "main," 

storage devices may be utilized to hold programs and/or "foo" and "bar," its corresponding profile data file will have 

other files while they are awaiting processing or execution. three procedure counter areas identified as "main," "foo" 

Furthermore, those skilled in the art will recognize that 60 and "bar." The profile data files may be stored or archived 

programs and data need not reside on computer system 10, with their corresponding source modules for later retrieval, 

but could reside on another computer system and engage in The second feature of interest of compiler 16 is the profile 

cooperative processing through the use of well known optimization mechanism 19 that reads in profile data files 30 

client-server mechanisms. during the execution of source modules 22 to create object 

As noted, compiler 16 includes a code generator 15 that 65 modules 27. The object modules 27 can then be linked 

can be directed to compile source modules 22 in a first together to create an optimized executable module 32. In 

manner that includes a novel instrumentation mechanism 17 addition, it is possible for part of the optimization mecha- 
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nisra 19* to reside within the linking mechanism 18 for Referring now to FIGS. 4-7, various flow charts are 

handling procedure packaging within the optimized execut- shown that describe preferred methods of implementing this 

able module 32. The optimization mechanism 19 can iden- invention. FIG. 4 depicts an overview of the three phases 

tify and locate the appropriate profile data file 30 for the involved in the implementation of this invention. The first 

source module 22 currently being compiled. Likewise, for 5 phase, the instrumentation phase (step 68), involves the 

each procedure in the source module being compiled, the process wherein compiler 16 inserts instrumentation code 

profile optimization mechanism 19 will verify the existence into the module and sets up storage areas for collected 

and validity of its corresponding PCA. Should the optimi- profile data. The benchmarking phase (step 70) involves the 

zation mechanism come across an invalid procedure counter process wherein the instrumented executable module 28 is 

area, the profile input information for that particular proce- ^ executed and profile data 30 is gathered therefrom. The final 

dure will be skipped and that procedure will be compiled phase, the optimization phase (step 72), involves the process 

without profiling. Thus, in the event that a particular pro- wherein the compiler 16 receives feedback information in 

cedure within a particular source module is modified (e.g., the form of profile data 30 to create an optimized executable 

for a bug fix etc.), re-profiling of the entire program with all module 32 

its source modules will not be necessary Rather, the indi- Referring now to FIG. 5, a flow chart is shown describing 

vKlual source modu e 22 can be recompiled using existing " hase of ^ in 

profile information from its corresponding profile data file t iL t . r c ■ «.._,. . , , 

30. Then, the optimization mechanism 19 will determine ^st, the companion of an individua source code module 

whether the profile information for each procedure is valid, 22 * Permed by compiler 16, applying instrumentation 

and if so, will process the procedure accordingly. For those mechanism 17. For each source code module 22 being 

procedures having invalid or nonexistent profile data, pro- 20 compiled, an MCA is initialized (step 58). At this point, the 

filing will be skipped. MCA is defined as a static data object (of as yet undeter- 

Thus, this system allows individual source modules 22 to mined size ) that is deluded in the object module 26 gener- 

be modified and recompiled into object modules 27 using atec * Dv tDe compiler 16. 

existing profile data. The recompiled modules can then be Next, each procedure within the source module 22 is 

linked with the existing object modules 27 using linking 2 5 processed (step 60). Similar to the MCA described above, a 

mechanism 18. In addition, linking mechanism 18 can use PCA is initialized for each procedure and is allocated from 

the collection of profile data files to build a weighted call within the MCA (step 62). Next, instrumentation code is 

graph in order to determine an optimized procedure pack- added at the appropriate places in the module being com- 

aging order. The systems and methods described herein with piled and counters are allocated in the PCA for each arc that 

regard to FIG. 1 are described in more detail with regard to 3Q is to be instrumented (step 64). Similarly, instrumentation 

FIGS. 2-7 and 12. c0( j e j s ac jded and counters are allocated within the PCA for 

Referring now to FIG. 2, a module counter area 40 each procedure call identified (step 66). These steps are then 

corresponding to source module 1 of FIG. 1 is shown. The repeated for each procedure until the module is completely 

module counter area (MCA) 40 includes a module counter processed. Once the source module is completely compiled, 

area index 42 and one or more procedure counter areas 44, additional ^mce modules may be compiled in a similar 

46 and 48. As noted each procedure counter area corre- h ^ QQ and U|cr ^ hef {Q ^ afi instrumented 

SP °^ T T executable module 32. It should be understood that the 

module. The module counter area index 42 is used by the , . . 4 . . 4 , .~ . r , c 

optimization mechanism 19 to locate a specific procedure dec * 10n '? ^cmcnt duTerent ypes of counters (e.g., for 

counter area. FIG. 3 depicts a procedure counter area 46 of eacl J ^ for each procedure call etc ) is not critical to the 

the module counter area shown in FIG. 2. In particular, the 40 implementation of this invention. Io those skilled in the art, 

PCA "foo" is shown to include header information 50, it is understood that other types of profiling information can 

control flow counters 52, direct call site counters 54 and be gathered. It is further understood that the concept of 

indirect call site counters 56. Control flow counters 52 are instrumenting along arcs in this description is used for 

used to measure arcs in a procedure's control flow graph, as exemplary purposes only. Instrumentation code may be 

shown in FIG. 9. Call site counters are used to measure 45 placed along arcs or within basic blocks, or in any location 

occurrences of calls to other procedures within a procedure deemed appropriate. 

body, (A direct call site counter is used when the procedure FIG. 6 depicts a flow diagram of the benchmark phase 

to be called is known at compile time, while an indirect call shown in FIG. 4. Pursuant to this embodiment, the instru- 

site counter is used when the called procedure's identity is mented program 28 is loaded (step 71), any instrumentation 

known only during program execution.) The header infor- 50 parameters are set up (e.g., counters are initialized) (step 

mation 50 provides general information regarding the pro- 73), the instrumented program is executed on a set of inputs 

cedure such as some type of identification, the number and believed to represent typical usage of the program (step 75), 

type of each counter, etc. Thus, by including header infor- and the profile data is harvested (step 77). The result is a 

mation 50, a PCA for a given procedure can be found even plurality of profile data files 30, each corresponding to a 

if procedures are added or deleted from the source code. 55 source module 22. 

While the PCA of FIG. 3 is shown to contain counters that FIG. 7 depicts a detailed flow diagram of the optimization 

are typically known and used for profiling purposes, any phase of FIG. 4 within compiler 16. Under this invention, 

type or number of counters may be implemented. one or more source code modules may be involved in the 

Finally, with regard to FIGS. 2 and 3, it should be optimization phase 72. For instance, if an optimized execut- 

recognized that the overall format and storage of these 60 able program 32 has not yet been created, the developer will 

counter areas is not critical. Rather, it is the ability to provide need to compile all of the source modules 22 with the 

a means by which an optimization mechanism can locate collected profile data. Alternatively, if a source module was 

profile information for individual source modules and their modified, and all of the remaining source modules were 

procedures. Thus, while this hierarchical system of using already compiled utilizing the profile data, then only that 

MCA's and as a preferred embodiment, it is recognized that 65 modified source code module need be recompiled in this 

other data management and storage facilities could also be optimization phase. In either event, source code modules are 

used. compiled individually by compiler 16 wherein optimization 
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mechanism 19 identifies the corresponding MCA for the 
source code module being compiled (step 74). The optimi- 
zation mechanism 19 then determines if there are procedures 
left to process (step 76). If no, the process is done (step 88). 
If yes, mechanism 19 processes the next procedure (step 78) 5 
as follows. First, it attempts to determine if the procedure 
has a valid corresponding PCA(step 80). If the answer is no, 
no profile information is used for the optimization of that 
procedure (i.e., it is optimized without profile data) (step 
86). In this case, the process returns to step 76 to determine 10 
if any procedures remain for processing. 

If it is determined that the procedure has a valid PCA, 
counter information is read from the PCA and the control 
flow graph being built by the compiler 16 is appropriately 
annotated (step 82). The procedure is then optimized, using 15 
the profile data annotated on the procedure's CFG to guide 
optimization (step 84). Once complete, any remaining pro- 
cedures are then processed in a similar fashion. 

Determining whether or not a procedure has a valid PCA 
may be accomplished by comparing a "signature" of the 20 
procedure with information in the PCA. For example, the 
optimization mechanism 19 can compare the number of 
counters in the PCA with the appropriate number of counters 
required in the procedure being processed. The optimization 
mechanism 19 could also compare a check sum in the PCA 25 
with a calculated check sum for the procedure. 

Further, the optimizer may determine that the profile data 
is "valid" in the broader sense; that is, portions of the 
program are sufficiently unchanged that the original profile 3Q 
data may be applied to those portions, and static estimates 
may be applied to the remaining portions. 

An additional aspect of this invention is the ability to 
produce an optimized procedure packaging order even in the 
presence of significant source code changes, such as the 35 
addition or deletion of procedures, without requiring repro- 
filing. Known in the art are existing methods that analyze a 
weighted call graph of an object module or executable 
module and rearrange the procedures in that module to 
improve spatial locality, thus making more efficient use of 40 
memory paging systems. Such methods could be imple- 
mented with linking mechanism 18, or within compiler 16, 
or within some other stand-alone tool. FIG. 12 depicts a flow 
diagram showing this method in accordance with the present 
invention. 45 

The first step is to gather counter information for proce- 
dure calls and use it to construct a call graph. For each 
source module used to build the object module or executable 
module being optimized, the optimization mechanism 
locates the source module's MCA (steps 92 and 94) and 50 
reads the counter information from each PCA within the 
MCA (steps 96, 98, 100). Each procedure call counter 
contributes its weight to an arc from the calling procedure to 
the called procedure. If there are multiple calls to one 
procedure within a different procedure, the corresponding 55 
weights are added. The result is a fully annotated call graph 
(step 100). Once the call graph has been fully constructed, 
a procedure packaging order is constructed, using one of the 
methods known today or developed in the future (step 102). 

The next step is to place the procedures in the output 60 
object module or executable module according to the pack- 
aging order (step 104). However, if any source modules have 
changed since the profile data was gathered, it is possible 
that procedures have been added or deleted. Any procedures 
that exist in the call graph but have since been deleted from 65 
the module will be omitted from the new packaging order. 
Any procedures that have been added (and do not exist in the 



call graph) are added to the packaging order (step 106), In 
a preferred embodiment, all such procedures are placed in an 
arbitrary order after all procedures that did appear in the 
packaging order. It is understood however that any other 
ordering of added procedures could likewise be used within 
the scope of this invention. 

The embodiments and examples set forth herein were 
presented in order to best explain the present invention and 
its practical application and to thereby enable those skilled 
in the art to make and use the invention. However, those 
skilled in the art will recognize that the foregoing descrip- 
tion and examples have been presented for the purposes of 
illustration and example only. The description as set forth is 
not intended to be exhaustive or to limit the invention to the 
precise form disclosed. Many modifications and variations 
are possible in light of the above teaching without departing 
from the spirit and scope of the following claims. 

We claim: 

1. An apparatus comprising: 
a processing unit; 

a memory system, said memory system being connected 
to said processing unit; and 

a compiler program stored in said memory system for 
execution on said processing unit, said compiler pro- 
gram including: 

a code generator that converts a first instruction module 
having at least one procedure into a second instruc- 
tion module; 

an instrumentation mechanism that inserts instrmenta- 
tion code into said second instruction module and 
initializes a data object that includes a procedure 
counter area for each at least one procedure in said 
first instruction module; and 

an optimization mechanism that optimizes said second 
instruction module wherein said optimization 
mechanism includes a checking mechanism that 
determines if each at least one procedure has a 
corresponding and valid procedure counter area, and 
a reading mechanism that reads count information 
for each at least one procedure from said correspond- 
ing and valid procedure counter area. 

2. The apparatus of claim 1 wherein said data object 
further includes a module counter area wherein said module 
counter area has a one-to-one correspondence with said first 
instruction module. 

3. The apparatus of claim 2 wherein said optimization 
mechanism further includes a mechanism that constructs a 
call graph from count information stored in said procedure 
counter areas, a mechanism that analyzes said call graph to 
determine a procedure packaging order, a mechanism that 
places procedures according to the packaging order and 
omits procedures that no longer exist, and a mechanism that 
places procedures not specified by the packaging order 
among the already placed procedures. 

4. The apparatus of claim 3 wherein said mechanism that 
places procedures not specified by the packaging order 
places them in an arbitrary order following already placed 
procedures. 

5. The apparatus of claim 1 wherein said count informa- 
tion includes data gathered from a benchmark execution of 
a computer program that includes code built from said first 
instruction module. 

6. The apparatus of claim 1 wherein said checking mecha- 
nism determines validity of each said at least one procedure 
counter area by comparing a signature of each procedure 
with information stored in each corresponding procedure 
counter area. 
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7. The apparatus of claim 1 wherein said checking mecha- 
nism determines validity of each of said at least one proce- 
dure counter area by comparing a signature of each proce- 
dure with information stored in each corresponding 
procedure counter area, wherein said signature of each 5 
procedure includes a total number of counters in said 
procedure. 

8. The apparatus of claim 1 wherein said checking mecha- 
nism determines validity of each of said at least one proce- 
dure counter area by comparing a signature of each proce- 3Q 
dure with information stored in each corresponding 
procedure counter area, wherein said signature of each 
procedure includes at least one functional value computed 
from attributes of said procedure. 

9. A program product comprising: 
a recordable media; and 
a compiler recorded on said recordable media accessible 

by a computer system for execution on a central 
processing unit, said compiler having: 
a first processing mechanism that translates at least one 20 
source code module into an output module, inserts 
instrumentation code into said output module, and 
initializes a procedure specific storage area for each 
procedure in said at least one source code module, 
wherein each said procedure specific storage area pro- 25 
vides space for holding procedure specific profile infor- 
mation generated during execution of said output mod- 
ule; and 

a second processing mechanism that translates said at 
least one source code module into an optimized output 30 
module, said second processing mechanism including a 
mechanism that examines each procedure in said at 
least one source code module, determines if procedure 
specific profile information exists for each procedure, 
determines if said existing procedure specific profile 35 
information is valid, and utilizes said valid procedure 
specific profile information to optimize said optimized 
output module. 

10. The program product of claim 9 wherein said first 
processing mechanism also initializes a module specific 40 
storage area for each at least one source code module 
processed by said first processing mechanism, wherein each 
said module specific storage area holds said procedure 
specific storage areas associated with said source code 
module. 45 

11. The program product of claim 10 further comprising 
a harvesting mechanism that converts the procedure specific 
profile information generated during execution of said out- 
put module into a plurality of profile files readable by said 
second processing mechanism such that each profile file 50 
corresponds to a unique source code module. 

12. The program product of claim 11 wherein each of said 
plurality of profile files contains procedure specific profile 
information of the procedures in the source code module 
corresponding to the profile file. 55 

13. The program product of claim 10 wherein said second 
processing mechanism further includes a mechanism that 
constructs a call graph from said procedure specific profile 
information stored in said procedure specific storage areas, 

a mechanism that analyzes said call graph to determine a 60 
procedure packaging order, a mechanism that places proce- 
dures according to the packaging order and omits procedures 
that no longer exist, and a mechanism that places procedures 
not specified by the packaging order among the already 
placed procedures. 65 

14. The program product of claim 13 wherein said mecha- 
nism that places procedures not specified by the packaging 



order places them in an arbitrary order following already 
placed procedures. 

15. The program product of claim 9 wherein said proce- 
dure specific profile information includes count data. 

16. A program product comprising: 
a recordable media; and 

an optimizing program recorded on said recordable media 
that optimizes program modules using profile data, 
wherein said optimizing program includes: 
a mechanism that processes procedures within each 

program module; 
a mechanism that determines if a unique set of proce- 
dure specific profile data exists for each procedure 
processed; 

a mechanism that determines if existing procedure 

specific profile data is valid; and 
a mechanism for reading and applying valid procedure 

specific profile data. 

17. The program product of claim 16 further comprising 
a mechanism that identifies a profile file that corresponds to 
the program module being optimized wherein said profile 
file contains said procedure specific data for the program 
module being optimized. 

18. The program product of claim 16 wherein said mecha- 
nism that determines if procedure specific profile data is 
valid examines the number of counters and compares the 
number with information from the related procedure. 

19. The program product of claim 16 wherein said mecha- 
nism that determines if procedure specific profile data is 
valid examines at least one functional value computed from 
attributes of the related procedure, 

20. The program product of claim 16 wherein said opti- 
mizing program further includes a mechanism that con- 
structs a call graph from said procedure specific profile data, 
a mechanism that analyzes said call graph to determine a 
procedure packaging order, a mechanism that places proce- 
dures according to the packaging order and omits procedures 
that no longer exist, and a mechanism that places procedures 
not specified by the packaging order among the already 
placed procedures. 

21. The program product of claim 20 wherein said mecha- 
nism that places procedures not specified by the packaging 
order places them in an arbitrary order following the already 
placed procedures. 

22. A method of managing profile data for a computer 
program that is built with a compiler from a plurality of 
source code modules wherein said method does not require 
the reprofiling of the entire computer program when a small 
portion of the source code is modified, said method com- 
prising the steps of: 

creating an instrumented executable program by initially 
compiling and linking the plurality of source code 
modules using the steps of: 

initializing a module counter area for each source code 

module being compiled; 
inserting instrumentation code into procedures as 

needed during the compilation of each source code 

module; and 

for each procedure receiving instrumentation code, 
initializing a procedure counter area that is contained 
within the module counter area of the source code 
module being compiled; 
benchmarking said instrumented executable program to 

generate profile data that is stored in said procedure 

counter areas within said module counter areas; and 
creating an optimized executable program by compiling 

and linking said plurality of source code modules using 

the steps of: 
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for each source code module being compiled, identi- 
fying the source code module's corresponding mod- 
ule counter area; 

for each procedure within the source code module 
being compiled, attempting to identify the proce- 5 
dure's corresponding procedure counter area; 

for each procedure counter area identified, determining 
if the procedure counter area is valid; and 

reading profile count data from the procedure counter 
area for optimizing purposes. 10 

23. The method of claim 22 comprising the further step of 
ignoring profile data from an invalid procedure counter area. 

24. The method of claim 22 wherein said step of creating 
an optimized executable program further includes the steps 
of: 15 

reading the profile call data within the procedure counter 

area for each procedure call within the source code file 

being compiled; 
building a call graph from said profile call data; 
analyzing said call graph to determine a packaging order; 
placing procedures according to said packaging order 

such that procedures that no longer exist are omitted; 

and 

placing procedures not specified by the packaging order 25 
among the already placed procedures. 

25. A method of building and maintaining an optimized 
computer program using existing profile information 
wherein profile information is stored in unique procedure 
counter area for each procedure within each source code 30 
module, said method comprising the steps of: 

compiling at least one source code module with said 
existing profile information, said compiling step 
including the steps of: 

for each procedure within the at least one source code 35 
module, determining if a procedure counter area 
exists; 

for each existing procedure counter area, determining if 
the existing procedure counter area is valid; and 

for each valid procedure counter area, reading said 40 
profile information stored therein and optimizing 
accordingly. 



26. The method of claim 25 further comprising the 
compiling steps of: 

for each procedure in said at least one source code 
module, identifying each procedure call and reading its 
corresponding counter from the procedure counter 
area; and 

optimizing accordingly. 

27. The method of claim 25 further comprising the 
compiling steps of: 

building a control flow graph for each procedure in said 
source code module; and 

determining validity of each procedure counter area by 
comparing a signature of the control flow graph with 
information in said procedure counter area. 

28. The method of claim 25 further comprising the 
compiling steps of: 

building a call graph from said profile information; 
analyzing said call graph to determine a packaging order; 
placing procedures according to said packaging order 

such that procedures that no longer exist are omitted; 

and 

placing procedures not specified by the packaging order 
among the already placed procedures. 

29. A method of creating and using profile data for a 
computer program that is built from at least one source code 
module having at least one procedure, said method com- 
prising the steps of: 

creating an instrumented executable program that 
includes a process for generating procedure specific 
profile data from said at least one source code module; 

benchmarking said instrumented executable program and 
storing profile information in procedure specific data 
areas; and 

optimizing said at least one source code module such that 
procedures that have not been modified since the 
benchmarking step will be processed using said proce- 
dure specific profile data and procedures that have been 
modified since the benchmarking step will be processed 
without procedure specific profile data. 
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